Search CORE

40 research outputs found

Machine learning in the real world with multiple objectives

Author: Bolukbasi Tolga
Publication venue
Publication date: 03/07/2018
Field of study

Machine learning (ML) is ubiquitous in many real-world applications. Existing ML systems are based on optimizing a single quality metric such as prediction accuracy. These metrics typically do not fully align with real-world design constraints such as computation, latency, fairness, and acquisition costs that we encounter in real-world applications. In this thesis, we develop ML methods for optimizing prediction accuracy while accounting for such real-world constraints. In particular, we introduce multi-objective learning in two different setups: resource-efficient prediction and algorithmic fairness in language models. First, we focus on decreasing the test-time computational costs of prediction systems. Budget constraints arise in many machine learning problems. Computational costs limit the usage of many models on small devices such as IoT or mobile phones and increase the energy consumption in cloud computing. We design systems that allow on-the-fly modification of the prediction model for each input sample. These sample-adaptive systems allow us to leverage wide variability in sample complexity where we learn policies for selecting cheap models for low complexity instances and using descriptive models only for complex ones. We utilize multiple--objective approach where one minimizes the system cost while preserving predictive accuracy. We demonstrate significant speed-ups in the fields of computer vision, structured prediction, natural language processing, and deep learning. In the context of fairness, we first demonstrate that a naive application of ML methods runs the risk of amplifying social biases present in data. This danger is particularly acute for methods based on word embeddings, which are increasingly gaining importance in many natural language processing applications of ML. We show that word embeddings trained on Google News articles exhibit female/male gender stereotypes. We demonstrate that geometrically, gender bias is captured by unique directions in the word embedding vector space. To remove bias we formulate a empirical risk objective with fairness constraints to remove stereotypes from embeddings while maintaining desired associations. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduces gender bias in embeddings, while preserving its useful properties such as the ability to cluster related concepts

Boston University Institutional Repository (OpenBU)

Resource Constrained Structured Prediction

Author: Bolukbasi Tolga
Chang Kai-Wei
Saligrama Venkatesh
Wang Joseph
Publication venue
Publication date: 07/06/2016
Field of study

We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Association for the Advancement of Artificial Intelligence: AAAI Publications

Quantifying and Reducing Stereotypes in Word Embeddings

Author: Bolukbasi Tolga
Chang Kai-Wei
Kalai Adam
Saligrama Venkatesh
Zou James
Publication venue
Publication date: 01/01/2016
Field of study

Machine learning algorithms are optimized to model statistical properties of the training data. If the input data reflects stereotypes and biases of the broader society, then the output of the learning algorithm also captures these stereotypes. In this paper, we initiate the study of gender stereotypes in {\em word embedding}, a popular framework to represent text data. As their use becomes increasingly common, applications can inadvertently amplify unwanted stereotypes. We show across multiple datasets that the embeddings contain significant gender stereotypes, especially with regard to professions. We created a novel gender analogy task and combined it with crowdsourcing to systematically quantify the gender bias in a given embedding. We developed an efficient algorithm that reduces gender stereotype using just a handful of training examples while preserving the useful geometric properties of the embedding. We evaluated our algorithm on several metrics. While we focus on male/female stereotypes, our framework may be applicable to other types of embedding biases.Comment: presented at 2016 ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, New York, N

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Do Neural Ranking Models Intensify Gender Bias?

Author: Bolukbasi Tolga
Devlin Jacob
Kulshrestha Juhi
Nguyen Tri
Pang Liang
Sebastian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/06/2020
Field of study

Concerns regarding the footprint of societal biases in information retrieval (IR) systems have been raised in several previous studies. In this work, we examine various recent IR models from the perspective of the degree of gender bias in their retrieval results. To this end, we first provide a bias measurement framework which includes two metrics to quantify the degree of the unbalanced presence of gender-related concepts in a given IR model's ranking list. To examine IR models by means of the framework, we create a dataset of non-gendered queries, selected by human annotators. Applying these queries to the MS MARCO Passage retrieval collection, we then measure the gender bias of a BM25 model and several recent neural ranking models. The results show that while all models are strongly biased toward male, the neural models, and in particular the ones based on contextualized embedding models, significantly intensify gender bias. Our experiments also show an overall increase in the gender bias of neural models when they exploit transfer learning, namely when they use (already biased) pre-trained embeddings.Comment: In Proceedings of ACM SIGIR 202

arXiv.org e-Print Archive

Crossref

The impact of the EU on Turkey: Toward streamlining Europeanisation as a research programme

Author: Ertugal E.
Ozcurumez S.
Tolga Bolukbasi H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

This article provides a reassessment of the literature on the transformative impact of the EU on Turkey through the lens of the Europeanisation research programme. It relies on systematic examination of a sample of the literature based on substantive findings, research design and methods. It suggests that this sample displays limitations characteristic of the Europeanisation research programme and proposes to remedy these limitations by applying the research design and methods used therein for generating empirically based comparative research on Turkey. © 2010 European Consortium for Political Research

Bilkent University Institutional Repository